A ridge penalized principal-components approach based on heritability for high-dimensional data.

نویسندگان

  • Yuanjia Wang
  • Yixin Fang
  • Man Jin
چکیده

OBJECTIVE To develop a ridge penalized principal-components approach based on heritability that can be applied to high-dimensional family data. METHODS The first principal component of heritability for a trait constellation is defined as a linear combination of traits that maximizes the heritability, which is equivalent to maximize the family-specific variation relative to the subject-specific variation. To analyze high-dimensional data and prevent overfitting, we propose a penalized principal-components approach based on heritability by adding a ridge penalty to the subject-specific variation. We choose the optimal regularization parameter by cross-validation. RESULTS The principal-components approach based on heritability with and without ridge penalty was compared to the usual principal-components analysis in four settings. The penalized principal-components of heritability analysis had substantially larger coefficients for the traits with genetic effect than for the traits with no genetic effect, while the non-regularized analysis failed to identify the genetic traits. In addition, linkage analysis on the combined traits showed that the power of the proposed methods was higher than the usual principal-components analysis and the non-regularized principal-components of heritability analysis. CONCLUSIONS The penalized principal-components approach based on heritability can effectively handle large number of traits with family structure and provide power gain for linkage analysis. The cross-validation procedure performs well in choosing optimal magnitude of penalty.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data

Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared th...

متن کامل

Clustering and principal-components approach based on heritability for mapping multiple gene expressions

When the number of phenotypes in a genetic study is on the scale of thousands, such as in studies concerning thousands of gene expression levels, the single-trait analysis is computationally intensive, and heavy adjustment of multiple comparisons is required. Traditional multivariate genetic linkage analysis for quantitative traits focuses on mapping only a few phenotypes and is not feasible fo...

متن کامل

Penalized Normal Likelihood and Ridge Regularization of Correlation and Covariance Matrices

High dimensionality causes problems in various areas of statistics. A particular situation that rarely has been considered is the testing of hypotheses about multivariate regression models in which the dimension of the multivariate response is large. In this article a ridge regularization approach is proposed in which either the covariance or the correlation matrix is regularized to ensure nons...

متن کامل

Methods for regression analysis in high-dimensional data

By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...

متن کامل

Pdmclass Function to Classify Microarray Data Using Penalized Discriminant Methods

Description This function is used to classify microarray data. Since the underlying model fit is based on penalized discriminant methods, there is no need for a pre-filtering step to reduce the number of genes. Usage pdmClass(formula , method = c("pls", "pcr", "ridge"), keep.fitted = Arguments formula A symbolic description of the model to be fit. Details given below. method One of "pls", "pcr"...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Human heredity

دوره 64 3  شماره 

صفحات  -

تاریخ انتشار 2007